Graphics in R

The R language has extensive graphical capabilities.

Graphics in R may be created by many different methods including base graphics and more advanced plotting packages such as lattice.

The ggplot2 package was created by Hadley Wickham and provides a intuitive plotting system to rapidly generate publication quality graphics.

ggplot2 builds on the concept of the “Grammar of Graphics” (Wilkinson 2005, Bertin 1983) which describes a consistent syntax for the construction of a wide range of complex graphics by a concise description of their components.

Why use ggplot2

The structured syntax and high level of abstraction used by ggplot2 should allow for the user to concentrate on the visualisations instead of creating the underlying code.

On top of this central philosophy ggplot2 has:

Grammar of Graphics

How to build a plot from its components.

How ggplot2 builds a plot.

Overview of example code for the ggplot2 scatter plot.

ggplot(data = <default data set>, 
       aes(x = <default x axis variable>,
           y = <default y axis variable>,
           ... <other default aesthetic mappings>),
       ... <other plot defaults>) +

       geom_scatter(aes(size = <size variable for this geom>, 
                      ... <other aesthetic mappings>),
                  data = <data for this point geom>,
                  stat = <statistic string or function>,
                  position = <position string or function>,
                  color = <"fixed color specification">,
                  <other arguments, possibly passed to the _stat_ function) +

  scale_<aesthetic>_<type>(name = <"scale label">,
                     breaks = <where to put tick marks>,
                     labels = <labels for tick marks>,
                     ... <other options for the scale>) +
  
  ggtitle("Graphics/Plot")+
  xlab("Weight")+
  ylab("Height")+

  theme(plot.title = element_text(colour = "gray"),
        ... <other theme elements>)

What users are required to specify in ggplot2 to build a plot.

Actual code for the ggplot2 scatter plot.

ggplot(data=patients_clean,
       aes(y=Weight,x=Height,colour=Sex,
           size=BMI,shape=Pet))
+geom_point()

Getting started with ggplot2

Earlier we have been working with the patients’ dataset to create a “clean and tidy” dataset.

Now we will use a cleaned dataset to demonstrate some of the plotting capabilities of ggplot2.

library(tidyr)
library(ggplot2)
library(dplyr)
library(stringr)
library(lubridate)

patients_clean <- read.delim("patient-data-cleaned.txt",sep="\t")
patients_clean
##            ID        Name      Race    Sex     Smokes Height Weight
## 1   AC/AH/001   Demetrius     White   Male Non-Smoker 182.87  76.57
## 2   AC/AH/017     Rosario     White   Male Non-Smoker 179.12  80.43
## 3   AC/AH/020       Julio     Black   Male Non-Smoker 169.15  75.48
## 4   AC/AH/022        Lupe     White   Male Non-Smoker 175.66  94.54
## 5   AC/AH/029      Lavern     White Female Non-Smoker 164.47  71.78
## 6   AC/AH/033      Bernie      <NA> Female     Smoker 158.27  69.90
## 7   AC/AH/037      Samuel     White Female Non-Smoker 161.69  68.85
## 8   AC/AH/044       Clair     White Female Non-Smoker 165.84  70.44
## 9   AC/AH/045     Shirley     White   Male Non-Smoker 181.32  76.90
## 10  AC/AH/048       Merle  Hispanic   Male Non-Smoker 167.37  79.06
## 11  AC/AH/049      Martin     White Female Non-Smoker 160.06  72.37
## 12  AC/AH/050     Frances     White Female Non-Smoker 166.48  67.34
## 13  AC/AH/052    Courtney     White   Male     Smoker 175.39  92.22
## 14  AC/AH/053     Francis     White Female     Smoker 164.70  75.69
## 15  AC/AH/057      Vernon     White Female     Smoker 163.79  65.76
## 16  AC/AH/061      Lester     Black   Male Non-Smoker 181.13  72.33
## 17  AC/AH/063       Robin  Hispanic   Male Non-Smoker 169.24  73.30
## 18  AC/AH/076      Albert     White   Male Non-Smoker 176.22  97.67
## 19  AC/AH/077       Tommy     Black   Male Non-Smoker 174.09  72.20
## 20  AC/AH/086        Kyle     Black   Male     Smoker 180.11  75.72
## 21  AC/AH/089        Dong     White   Male Non-Smoker 179.24  75.54
## 22  AC/AH/100      Michel     White Female Non-Smoker 161.92  69.92
## 23  AC/AH/104      Jeremy     White   Male     Smoker 169.85  90.63
## 24  AC/AH/112         Pat     Black Female Non-Smoker 160.57  63.54
## 25  AC/AH/113      Eugene     White Female Non-Smoker 168.24  69.57
## 26  AC/AH/114        Kris  Hispanic   Male Non-Smoker 177.75  74.84
## 27  AC/AH/115       Tracy Bi-Racial   Male     Smoker 183.21  83.36
## 28  AC/AH/127        Jame     White   Male Non-Smoker 167.75  82.06
## 29  AC/AH/133       Clyde  Hispanic   Male Non-Smoker 181.15  83.93
## 30  AC/AH/150       Brett     White   Male     Smoker 181.56  79.54
## 31  AC/AH/154        Tony     White Female Non-Smoker 160.03  64.30
## 32  AC/AH/156      George     White   Male Non-Smoker 165.62  76.72
## 33  AC/AH/159      Edward     White   Male Non-Smoker 181.64  96.91
## 34  AC/AH/160        Rory     Asian Female Non-Smoker 159.67  71.88
## 35  AC/AH/164       Shane  Hispanic   Male     Smoker 177.03  74.04
## 36  AC/AH/171       Devin     White Female Non-Smoker 163.35  70.46
## 37  AC/AH/176       Jerry     Asian   Male Non-Smoker 175.21  83.65
## 38  AC/AH/180        Drew     White Female Non-Smoker 160.80  64.77
## 39  AC/AH/185      Ronald     White   Male Non-Smoker 166.46  76.83
## 40  AC/AH/186 Christopher     White Female Non-Smoker 157.95  67.41
## 41  AC/AH/192   Dominique     White   Male Non-Smoker 180.61  83.59
## 42  AC/AH/198         Van     White Female Non-Smoker 159.52  67.99
## 43  AC/AH/207      Bobbie     White Female Non-Smoker 163.01  65.19
## 44  AC/AH/208    Lawrence  Hispanic Female Non-Smoker 165.80  71.77
## 45  AC/AH/210       Keith  Hispanic Female     Smoker 170.03  66.68
## 46  AC/AH/211         Son     White Female Non-Smoker 157.16  69.64
## 47  AC/AH/213     Charlie     White Female     Smoker 164.58  72.99
## 48  AC/AH/219         Jay     White Female Non-Smoker 163.47  72.89
## 49  AC/AH/220     Richard     White   Male Non-Smoker 185.43  87.23
## 50  AC/AH/221      Carlos     White Female Non-Smoker 165.34  70.84
## 51  AC/AH/225        Gail     White Female Non-Smoker 163.45  67.67
## 52  AC/AH/233      Marion     White Female Non-Smoker 163.97  66.71
## 53  AC/AH/241     Lindsay     White Female Non-Smoker 161.38  73.55
## 54  AC/AH/244        Sean     White Female Non-Smoker 160.09  65.93
## 55  AC/AH/248      Andrea     White   Male Non-Smoker 178.64  97.05
## 56  AC/AH/249       Jesus  Hispanic Female     Smoker 159.78  68.31
## 57  AC/SG/002         Jan     White Female     Smoker 161.57  67.92
## 58  AC/SG/003      Walter     White Female Non-Smoker 161.83  66.03
## 59  AC/SG/008        Dana     White   Male     Smoker 169.66  77.30
## 60  AC/SG/009       Sammy     White   Male Non-Smoker 166.84  88.25
## 61  AC/SG/010        Theo     Asian Female Non-Smoker 159.32  64.92
## 62  AC/SG/015       Shaun     White   Male     Smoker 170.51  84.35
## 63  AC/SG/016      Jimmie     Black Female Non-Smoker 161.84  69.97
## 64  AC/SG/046        Carl  Hispanic   Male Non-Smoker 171.41  81.70
## 65  AC/SG/055        Evan     White   Male Non-Smoker 166.75  79.06
## 66  AC/SG/056     Merrill     Asian Female     Smoker 166.19  67.46
## 67  AC/SG/064         Jon     White   Male Non-Smoker 169.16  90.08
## 68  AC/SG/065      Shayne     White Female Non-Smoker 157.01  66.56
## 69  AC/SG/067      Thomas     White   Male Non-Smoker 167.51  84.15
## 70  AC/SG/068   Valentine  Hispanic Female Non-Smoker 160.47  68.20
## 71  AC/SG/072     Cameron     Black Female     Smoker 162.33  66.47
## 72  AC/SG/074       Eddie  Hispanic   Male Non-Smoker 175.67  88.82
## 73  AC/SG/084       Brian  Hispanic   Male Non-Smoker 174.25  80.93
## 74  AC/SG/095     Matthew     White Female Non-Smoker 158.94  65.14
## 75  AC/SG/099      Leslie     Asian   Male Non-Smoker 172.72  67.62
## 76  AC/SG/101       Jason     White Female Non-Smoker 159.23  69.96
## 77  AC/SG/107         Sol     White   Male Non-Smoker 176.54  90.76
## 78  AC/SG/116      Connie     Black   Male Non-Smoker 184.34  90.41
## 79  AC/SG/121        Rudy     White Female Non-Smoker 163.94  71.47
## 80  AC/SG/122      Michal  Hispanic Female Non-Smoker 160.09  68.94
## 81  AC/SG/123     Darnell     White Female     Smoker 162.32  72.72
## 82  AC/SG/134       Daryl     White Female     Smoker 162.59  69.76
## 83  AC/SG/139      Jordan     White   Male Non-Smoker 171.94  82.11
## 84  AC/SG/142     Kenneth     White Female Non-Smoker 158.07  69.80
## 85  AC/SG/155     Raymond     White Female Non-Smoker 158.35  69.72
## 86  AC/SG/165       Elmer     White Female Non-Smoker 162.18  67.81
## 87  AC/SG/167       Jimmy     White Female Non-Smoker 159.38  70.37
## 88  AC/SG/172     Whitney     White   Male Non-Smoker 171.45  84.29
## 89  AC/SG/173       Britt     White Female     Smoker 163.17  64.47
## 90  AC/SG/179       Logan     White   Male Non-Smoker 183.10  82.47
## 91  AC/SG/181       Terry  Hispanic   Male Non-Smoker 177.14  88.70
## 92  AC/SG/182       Jamie  Hispanic   Male     Smoker 171.08  72.51
## 93  AC/SG/191        Lacy  Hispanic Female Non-Smoker 159.33  70.68
## 94  AC/SG/193      Ronnie     White   Male     Smoker 185.43  73.63
## 95  AC/SG/194      Joseph     White Female Non-Smoker 162.65  73.99
## 96  AC/SG/197       Stacy     White Female Non-Smoker 159.44  66.21
## 97  AC/SG/204     Anthony     White Female Non-Smoker 164.11  70.66
## 98  AC/SG/216        Alva     White Female Non-Smoker 159.13  66.96
## 99  AC/SG/217        Dean     White Female Non-Smoker 160.58  71.49
## 100 AC/SG/234        Luis  Hispanic Female Non-Smoker 164.88  68.07
##          Birth          State   Pet Grade  Died Count Date.Entered.Study
## 1   1972-02-06        Georgia   Dog     2 FALSE  0.01         2015-12-01
## 2   1972-06-15       Missouri   Dog     2 FALSE -1.31                   
## 3   1972-07-09   Pennsylvania  None     2 FALSE -0.17                   
## 4   1972-08-17        Florida   Cat     1 FALSE -1.10                   
## 5   1973-06-12           Iowa  <NA>     2  TRUE  1.42                   
## 6   1973-07-01       Maryland   Dog     2 FALSE  0.29                   
## 7   1972-03-26   Pennsylvania  None     1 FALSE  0.16                   
## 8   1973-05-11 North Carolina  None     1 FALSE -0.07                   
## 9   1971-12-31      Louisiana   Dog     1 FALSE -1.43                   
## 10  1973-07-19 North Carolina  None     2 FALSE  0.54         2015-12-31
## 11  1972-05-04     California Horse     2  TRUE -2.41                   
## 12  1971-11-14       Michigan  None     1 FALSE  1.05                   
## 13  1972-03-22        Indiana  Bird     3 FALSE -0.04                   
## 14  1971-11-22       Virginia   Dog     1 FALSE -0.65                   
## 15  1972-01-12       Illinois   Cat     3 FALSE  0.06                   
## 16  1972-11-22      Wisconsin   Dog    NA  TRUE -0.15                   
## 17  1971-11-22       Illinois  None     3 FALSE  0.68                   
## 18  1973-04-14      Louisiana   Cat     2 FALSE -1.97                   
## 19  1973-02-07     Washington   Cat     3 FALSE -1.18                   
## 20  1973-05-18        Georgia   Cat     3 FALSE  0.40                   
## 21  1972-03-17     California  None     2  TRUE -0.47                   
## 22  1973-01-02        Georgia   Dog     1 FALSE  0.50                   
## 23  1972-04-18       Kentucky  None     1  TRUE -0.20                   
## 24  1973-07-02     California  <NA>    NA  TRUE  0.69         2016-01-31
## 25  1972-02-13  Massachusetts  <NA>     2 FALSE  0.38                   
## 26  1972-11-25   Pennsylvania  Bird     3 FALSE  0.15                   
## 27  1973-10-05     California   Dog     2 FALSE  0.05                   
## 28  1972-11-04          Texas   Dog     1  TRUE  0.63                   
## 29  1973-10-19     Washington   Cat     3  TRUE -0.81         2016-03-02
## 30  1972-05-09       Kentucky   Dog     1  TRUE  0.92                   
## 31  1973-09-05     California   Dog     1  TRUE -0.10                   
## 32  1972-07-15     California   Dog     1  TRUE  0.61                   
## 33  1972-12-10    Connecticut   Cat     2 FALSE -0.39                   
## 34  1973-09-28        Florida   Cat     2  TRUE  0.74                   
## 35  1972-02-24        Florida  None     2 FALSE  0.15                   
## 36  1973-04-22     California  Bird     3  TRUE  1.68         2016-03-31
## 37  1973-05-07       Virginia   Dog     3  TRUE  0.41                   
## 38  1973-02-24         Oregon   Cat     1  TRUE -2.17                   
## 39  1972-08-23       Colorado  None    NA  TRUE -1.32                   
## 40  1972-05-12     New Jersey   Dog     3  TRUE -1.89                   
## 41  1972-03-30       Michigan  None     3  TRUE -0.60                   
## 42  1972-12-08       Missouri   Cat     2 FALSE -0.40                   
## 43  1973-05-23        Florida   Dog     2 FALSE  1.07                   
## 44  1973-08-13      Louisiana  None     1 FALSE -1.21                   
## 45  1972-09-03       New York   Dog    NA FALSE -0.66                   
## 46  1973-07-20     California   Cat     2  TRUE -0.70         2016-05-01
## 47  1972-01-31      Louisiana   Dog     1 FALSE -0.32                   
## 48  1972-04-13 North Caroline  Bird     1  TRUE  0.33                   
## 49  1973-07-19        Florida   Cat     1 FALSE  1.29                   
## 50  1972-02-07       Michigan   Dog    NA  TRUE  1.69                   
## 51  1972-10-27       Michigan   Cat     2 FALSE -0.74                   
## 52  1971-12-29           Ohio   Cat     3  TRUE -1.07                   
## 53  1972-02-14        Florida   Cat     3 FALSE -0.82         2016-05-31
## 54  1973-01-31       Maryland  None    NA  TRUE  0.50                   
## 55  1973-01-18        Indiana   Cat     1  TRUE -1.21                   
## 56  1972-04-29        Alabama   Cat     2  TRUE  0.52                   
## 57  1973-07-09        Arizona   Dog     3 FALSE  0.33                   
## 58  1972-07-17         Oregon  None     2  TRUE -0.90                   
## 59  1973-06-01         Nevada   Dog     1  TRUE -2.36                   
## 60  1972-03-10        Vermont   Dog     1 FALSE  1.51         2016-07-01
## 61  1973-02-04       New York   Cat     2  TRUE  0.71                   
## 62  1972-11-15     New Jersey   Dog     3  TRUE  0.12                   
## 63  1972-04-09        Arizona   Cat     3  TRUE -0.57                   
## 64  1973-08-11    Mississippi  Bird     2  TRUE  0.43                   
## 65  1972-03-01       Illinois  Bird     3  TRUE -0.55         2016-07-31
## 66  1972-12-03        Indiana  <NA>     3  TRUE -1.06                   
## 67  1972-10-10       Illinois   Cat     2  TRUE -0.10                   
## 68  1972-04-11     California   Dog     3  TRUE  0.64                   
## 69  1972-07-25   Pennsylvania  Bird     2  TRUE -1.46                   
## 70  1972-04-21      Tennessee   Cat     3  TRUE  0.83                   
## 71  1972-02-21      Californa  <NA>     3 FALSE  1.02                   
## 72  1973-10-10        Georgia  None    NA FALSE -3.14                   
## 73  1972-03-12       Virginia   Dog     2  TRUE  1.79                   
## 74  1973-06-01         Hawaii   Dog     1 FALSE  0.47                   
## 75  1972-02-10           Ohio   Cat     1 FALSE  0.05                   
## 76  1973-10-04       Michigan   Dog     2  TRUE -0.81                   
## 77  1973-02-03         Hawaii  None     3 FALSE -1.48         2016-08-31
## 78  1972-06-11        Florida  None     3  TRUE -1.41                   
## 79  1973-03-18       Michigan   Cat     3 FALSE -0.25                   
## 80  1971-12-22 South Carolina   Dog     1 FALSE -0.99                   
## 81  1972-09-09 North Caroline  Bird     1  TRUE -0.92                   
## 82  1972-06-03          Texas   Cat     2  TRUE -0.28                   
## 83  1973-10-12       Michigan  None     1 FALSE -1.18                   
## 84  1972-05-21         Kansas   Dog     3 FALSE  0.54                   
## 85  1972-06-08     California   Cat     3  TRUE  1.42                   
## 86  1972-03-31     Washington  Bird     1  TRUE  1.68                   
## 87  1973-10-06     Washington  None     2  TRUE  0.02         2016-10-01
## 88  1972-03-02        Florida   Dog     2  TRUE -0.19                   
## 89  1973-06-25      Californa  None     2 FALSE  0.69                   
## 90  1972-10-30           Ohio   Dog     3  TRUE  0.77                   
## 91  1971-11-30        Indiana   Cat     3  TRUE  1.76                   
## 92  1973-03-31      Louisiana  None     3  TRUE -0.22                   
## 93  1973-06-27          Texas  None     3  TRUE -1.07                   
## 94  1973-06-11           Iowa   Dog     3 FALSE -0.29                   
## 95  1972-08-10       Maryland   Cat     3 FALSE  0.87         2016-10-31
## 96  1972-11-14       New York   Cat     1  TRUE  1.21                   
## 97  1972-06-23     California   Dog     3 FALSE -0.17                   
## 98  1972-06-25        Alabama  None     1  TRUE  0.95                   
## 99  1972-11-17           Ohio  None     1  TRUE -0.78                   
## 100 1971-11-16   Pennsylvania   Cat     3  TRUE  0.35                   
##     Age   BMI Overweight
## 1    44 22.90      FALSE
## 2    43 25.07       TRUE
## 3    43 26.38       TRUE
## 4    43 30.64       TRUE
## 5    42 26.54       TRUE
## 6    42 27.90       TRUE
## 7    44 26.34       TRUE
## 8    42 25.61       TRUE
## 9    44 23.39      FALSE
## 10   42 28.22       TRUE
## 11   44 28.25       TRUE
## 12   44 24.30      FALSE
## 13   44 29.98       TRUE
## 14   44 27.90       TRUE
## 15   44 24.51      FALSE
## 16   43 22.05      FALSE
## 17   44 25.59       TRUE
## 18   43 31.45       TRUE
## 19   43 23.82      FALSE
## 20   42 23.34      FALSE
## 21   44 23.51      FALSE
## 22   43 26.67       TRUE
## 23   44 31.42       TRUE
## 24   42 24.64      FALSE
## 25   44 24.58      FALSE
## 26   43 23.69      FALSE
## 27   42 24.83      FALSE
## 28   43 29.16       TRUE
## 29   42 25.58       TRUE
## 30   44 24.13      FALSE
## 31   42 25.11       TRUE
## 32   43 27.97       TRUE
## 33   43 29.37       TRUE
## 34   42 28.19       TRUE
## 35   44 23.63      FALSE
## 36   43 26.41       TRUE
## 37   43 27.25       TRUE
## 38   43 25.05       TRUE
## 39   43 27.73       TRUE
## 40   43 27.02       TRUE
## 41   44 25.63       TRUE
## 42   43 26.72       TRUE
## 43   42 24.53      FALSE
## 44   42 26.11       TRUE
## 45   43 23.06      FALSE
## 46   42 28.20       TRUE
## 47   44 26.95       TRUE
## 48   44 27.28       TRUE
## 49   42 25.37       TRUE
## 50   44 25.91       TRUE
## 51   43 25.33       TRUE
## 52   44 24.81      FALSE
## 53   44 28.24       TRUE
## 54   43 25.72       TRUE
## 55   43 30.41       TRUE
## 56   44 26.76       TRUE
## 57   42 26.02       TRUE
## 58   43 25.21       TRUE
## 59   42 26.85       TRUE
## 60   44 31.70       TRUE
## 61   43 25.58       TRUE
## 62   43 29.01       TRUE
## 63   44 26.71       TRUE
## 64   42 27.81       TRUE
## 65   44 28.43       TRUE
## 66   43 24.43      FALSE
## 67   43 31.48       TRUE
## 68   44 27.00       TRUE
## 69   43 29.99       TRUE
## 70   44 26.48       TRUE
## 71   44 25.22       TRUE
## 72   42 28.78       TRUE
## 73   44 26.65       TRUE
## 74   42 25.79       TRUE
## 75   44 22.67      FALSE
## 76   42 27.59       TRUE
## 77   43 29.12       TRUE
## 78   43 26.61       TRUE
## 79   43 26.59       TRUE
## 80   44 26.90       TRUE
## 81   43 27.60       TRUE
## 82   43 26.39       TRUE
## 83   42 27.77       TRUE
## 84   43 27.94       TRUE
## 85   43 27.80       TRUE
## 86   44 25.78       TRUE
## 87   42 27.70       TRUE
## 88   44 28.67       TRUE
## 89   42 24.21      FALSE
## 90   43 24.60      FALSE
## 91   44 28.27       TRUE
## 92   43 24.77      FALSE
## 93   42 27.84       TRUE
## 94   42 21.41      FALSE
## 95   43 27.97       TRUE
## 96   43 26.05       TRUE
## 97   43 26.24       TRUE
## 98   43 26.44       TRUE
## 99   43 27.72       TRUE
## 100  44 25.04       TRUE

Our first ggplot2 graph

As seen above, in order to produce a ggplot2 graph we need a minimum of:-

  • Data to be used in graph
  • Mappings of data to the graph (aesthetic mapping)
  • What type of graph we want to use (The geom to use).

In the code below we define the data as our cleaned patients data frame.

pcPlot <- ggplot(data=patients_clean)
class(pcPlot)
## [1] "gg"     "ggplot"
pcPlot$data[1:4,]
##          ID      Name  Race  Sex     Smokes Height Weight      Birth
## 1 AC/AH/001 Demetrius White Male Non-Smoker 182.87  76.57 1972-02-06
## 2 AC/AH/017   Rosario White Male Non-Smoker 179.12  80.43 1972-06-15
## 3 AC/AH/020     Julio Black Male Non-Smoker 169.15  75.48 1972-07-09
## 4 AC/AH/022      Lupe White Male Non-Smoker 175.66  94.54 1972-08-17
##          State  Pet Grade  Died Count Date.Entered.Study Age   BMI
## 1      Georgia  Dog     2 FALSE  0.01         2015-12-01  44 22.90
## 2     Missouri  Dog     2 FALSE -1.31                     43 25.07
## 3 Pennsylvania None     2 FALSE -0.17                     43 26.38
## 4      Florida  Cat     1 FALSE -1.10                     43 30.64
##   Overweight
## 1      FALSE
## 2       TRUE
## 3       TRUE
## 4       TRUE

Now we can see that we have gg/ggplot object (pcPlot) and in this the data has been defined.

Important information on how to map the data to the visual properties (aesthetics) of the plot as well as what type of plot to use (geom) have however yet to specified.

pcPlot$mapping
## * NULL ->
pcPlot$theme
## list()
pcPlot$layers
## list()

The information to map the data to the plot can be added now using the aes() function.

pcPlot <- ggplot(data=patients_clean)

pcPlot <- pcPlot+aes(x=Height,y=Weight)

pcPlot$mapping
## * x -> Height
## * y -> Weight
pcPlot$theme
## list()
pcPlot$layers
## list()

But we are still missing the final component of our plot, the type of plot to use (geom).

Below the geom_point function is used to specify a point plot, a scatter plot of Height values on the x-axis versus Weight values on the y values.

pcPlot <- ggplot(data=patients_clean)

pcPlot <- pcPlot+aes(x=Height,y=Weight)
pcPlot <- pcPlot+geom_point()

pcPlot$mapping
## * x -> Height
## * y -> Weight
pcPlot$theme
## list()
pcPlot$layers
## [[1]]
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity

Now we have all the components of our plot, we need we can display the results.

pcPlot

More typically, the data and aesthetics are defined within ggplot function and geoms applied afterwards.

pcPlot <- ggplot(data=patients_clean,
                 mapping=aes(x=Height,y=Weight))
pcPlot+geom_point()

Geoms - Plot types

As we have seen, an important element of a ggplot is the geom used. Following the specification of data, the geom describes the type of plot used.

Several geoms are available in ggplot2:-

Geoms - Line plots

pcPlot <- ggplot(data=patients_clean,
        mapping=aes(x=Height,y=Weight))

pcPlot_line <- pcPlot+geom_line() 

pcPlot_line

pcPlot <- ggplot(data=patients_clean,
        mapping=aes(x=Height,y=Weight))

pcPlot_smooth <- pcPlot+geom_smooth() 

pcPlot_smooth
## `geom_smooth()` using method = 'loess'

Geoms - Bar and frequency plots

pcPlot <- ggplot(data=patients_clean,
        mapping=aes(x=Sex))

pcPlot_bar <- pcPlot+geom_bar() 

pcPlot_bar

pcPlot <- ggplot(data=patients_clean,
        mapping=aes(x=Height))

pcPlot_hist <- pcPlot+geom_histogram() 

pcPlot_hist
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

pcPlot <- ggplot(data=patients_clean,
        mapping=aes(x=Height))

pcPlot_density <- pcPlot+geom_density() 

pcPlot_density

Geoms - Box and violin plots

pcPlot <- ggplot(data=patients_clean,
        mapping=aes(x=Sex,y=Height))

pcPlot_boxplot <- pcPlot+geom_boxplot() 

pcPlot_boxplot

pcPlot <- ggplot(data=patients_clean,
        mapping=aes(x=Sex,y=Height))

pcPlot_violin <- pcPlot+geom_violin() 

pcPlot_violin

An overview of geoms and thier arguments can be found at ggplot2 documentation or within the ggplot2 cheatsheet.

-ggplot2 documentation

-ggplot2 Cheatsheet

Aesthetics

In order to change the property on an aesthetic of a plot into a constant value (e.g. set colour of all points to red) we can supply the colour argument to the geom_point() function.

pcPlot <- ggplot(data=patients_clean,
                 mapping=aes(x=Height,y=Weight))
pcPlot+geom_point(colour="red")

As we discussed earlier however, ggplot2 makes use of aesthetic mappings to assign variables in the data to the properties/aesthetics of the plot. This allows the properties of the plot to reflect variables in the data dynamically.

In these examples we supply additional information to the aes() function to define what information to display and how it is represented in the plot.

First we can recreate the plot we saw earlier.

pcPlot <- ggplot(data=patients_clean,
                 mapping=aes(x=Height,y=Weight))
pcPlot+geom_point()

Now we can adjust the aes mapping by supplying an argument to the colour parameter in the aes function. (Note that ggplot2 accepts “color” or “colour” as parameter name)

This simple adjustment allows for identifaction of the separation between male and female measurements for height and weight.

pcPlot <- ggplot(data=patients_clean,
                 mapping=aes(x=Height,y=Weight,colour=Sex))
pcPlot+geom_point()

Similarly the shape of points may be adjusted.

pcPlot <- ggplot(data=patients_clean,
                 mapping=aes(x=Height,y=Weight,shape=Sex))
pcPlot+geom_point()

The aesthetic mappings may be set directly in the geom_points() function as previously when specifying red. This can allow the same ggplot object to be used by different aesethetic mappings and varying geoms

pcPlot <- ggplot(data=patients_clean)
pcPlot+geom_point(aes(x=Height,y=Weight,colour=Sex))

pcPlot+geom_point(aes(x=Height,y=Weight,colour=Smokes))

pcPlot+geom_point(aes(x=Height,y=Weight,colour=Smokes,shape=Sex))

pcPlot+geom_violin(aes(x=Sex,y=Height,fill=Smokes))

Again, for a comprehensive list of parameters and aesthetic mappings used in geom_type functions see the ggplot2 documentation for individual geoms by using ?geom_type

?geom_point

or visit the ggplot2 documentations pages and cheatsheet

Facets

One very useful feature of ggplot is faceting. This allows you to produce plots subset by variables in your data.

To facet our data into multiple plots we can use the facet_wrap or facet_grid function specifying the variable we split by.

The facet_grid function is well suited to splitting the data by two factors.

Here we can plot the data with the Smokes variable as rows and Sex variable as columns.

facet_grid(Rows~Columns)

pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight,colour=Sex))+geom_point()
pcPlot + facet_grid(Smokes~Sex)

To split by one factor we can apply the facet_grid() function ommiting the variable before the “~”" to facet along columns in plot.

facet_grid(~Columns)

pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight,colour=Sex))+geom_point()
pcPlot + facet_grid(~Sex)

To split along rows in plot, the variable is placed before the “~.”.

facet_grid(Rows~.)

pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight,colour=Sex))+geom_point()
pcPlot + facet_grid(Sex~.)

The facet_wrap() function offers a less grid based structure but is well suited to faceting data by one variable.

For facet_wrap() we follow as similar syntax to facet_grid()

pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight,colour=Sex))+geom_point()
pcPlot + facet_wrap(~Smokes)

For more complex faceting both facet_grid and facet_wrap can accept combinations of variables.

Using facet_wrap

pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight,colour=Sex))+geom_point()
pcPlot + facet_wrap(~Pet+Smokes+Sex)

Or in a nice grid format using facet_grid() and the Smokes variable against a combination of Gender and Pet.

pcPlot + facet_grid(Smokes~Sex+Pet)

Exercise set 1

Link_to_exercises

Link_to_exercises_with_images

Link_to_Rmarkdown template

Link_to_answers

Scales

Scales and their legends have so far been handled using ggplot2 defaults. ggplot2 offers functionality to have finer control over scales and legends using the scale methods.

Scale methods are divided into functions by combinations of

scale_aesthetic_type

Try typing in scale_ then tab to autocomplete. This will provide some examples of the scale functions available in ggplot2.

Although different scale functions accept some variety in their arguments, common arguments to scale functions include -

Controlling the X and Y scale.

Both continous and discrete X/Y scales can be controlled in ggplot2 using the

scale_(x/y)_(continous/discrete)

In this example we control the continuous sale on the x-axis by providing a name, X-axis limits, the positions of breaks (ticks/labels) and the labels to place at breaks.

pcPlot +
  geom_point() +
  facet_grid(Smokes~Sex)+
  scale_x_continuous(name="height ('cm')",
                     limits = c(100,200),
                     breaks=c(125,150,175),
                     labels=c("small","justright","tall"))

Similary control over discrete scales is shown below.

pcPlot <- ggplot(data=patients_clean,aes(x=Sex,y=Height))
pcPlot +
  geom_violin(aes(x=Sex,y=Height)) +
  scale_x_discrete(labels=c("Girls","Guys"))

Multiple X/Y scales can be combined to give full control of axis marks.

pcPlot <- ggplot(data=patients_clean,aes(x=Sex,y=Height,fill=Smokes))
pcPlot +
  geom_violin(aes(x=Sex,y=Height)) +
  scale_x_discrete(labels=c("Guys","Girls"))+
  scale_y_continuous(breaks=c(160,180),labels=c("Petite","Tall"))

Controlling other scales.

When using fill,colour,linetype, shape, size or alpha aesthetic mappings the scales are automatically selected for you and the appropriate legends created.

pcPlot <- ggplot(data=patients_clean,
                 aes(x=Height,y=Weight,colour=Sex))
pcPlot + geom_point(size=4)

In the above example the discrete colours for the Sex variable was selected by default.

Manual discrete colour scale

Manual control of discrete variables can be performed using scale_aes_Of_Interest_manual with the values parameter. Additionally in this example an updated name for the legend is provided.

pcPlot <- ggplot(data=patients_clean,
                 aes(x=Height,y=Weight,colour=Sex))
pcPlot + geom_point(size=4) + 
  scale_color_manual(values = c("Green","Purple"),
                     name="Gender")

Colorbrewer for discrete colour scale

Here we have specified the colours to be used (hence the manual) but when the number of levels to a variable are high this may be impractical and often we would like ggplot2 to choose colours from a scale of our choice.

The brewer set of scale functions allow the user to make use of a range of palettes available from colorbrewer.

  • Diverging

BrBG, PiYG, PRGn, PuOr, RdBu, RdGy, RdYlBu, RdYlGn, Spectral

  • Qualitative

Accent, Dark2, Paired, Pastel1, Pastel2, Set1, Set2, Set3

  • Sequential

Blues, BuGn, BuPu, GnBu, Greens, Greys, Oranges, OrRd, PuBu, PuBuGn, PuRd, Purples, RdPu, Reds, YlGn, YlGnBu, YlOrBr, YlOrRd

pcPlot <- ggplot(data=patients_clean,
                 aes(x=Height,y=Weight,colour=Pet))
pcPlot + geom_point(size=4) + 
  scale_color_brewer(palette = "Set2")
## Warning: Removed 5 rows containing missing values (geom_point).

For more details on palette sizes and styles visit the colorbrewer website and ggplot2 reference page.

Continous colour scales

So far we have looked a qualitative scales but ggplot2 offers much functionality for continuous scales such as for size, alpha (transparancy), colour and fill.

  • scale_alpha_continuous() - For Transparancy

  • scale_size_continuous() - For control of size.

Both these functions accept the range of alpha/size to be used in plotting.

Below the range of alpha to be used in plot is limited to between 0.5 and 1

pcPlot <- ggplot(data=patients_clean,
                 aes(x=Height,y=Weight,alpha=BMI))
pcPlot + geom_point(size=4) + 
  scale_alpha_continuous(range = c(0.5,1))

Below the range of sizes to be used in plot is limited to between 3 and 6

pcPlot <- ggplot(data=patients_clean,
                 aes(x=Height,y=Weight,size=BMI))
pcPlot + geom_point(alpha=0.8) + 
  scale_size_continuous(range = c(3,6))

The limits of the scale can also be controlled but it is important to note data outside of scale is removed from plot.

pcPlot <- ggplot(data=patients_clean,
                 aes(x=Height,y=Weight,size=BMI))
pcPlot + geom_point() + scale_size_continuous(range = c(3,6), limits = c(25,40))
## Warning: Removed 23 rows containing missing values (geom_point).

What points of scale to be labeled and labels text can also be controlled.

pcPlot <- ggplot(data=patients_clean,
                 aes(x=Height,y=Weight,size=BMI))
pcPlot + geom_point() + scale_size_continuous(range = c(3,6) ,breaks=c(25,30),labels=c("Good","Good but not 25"))

Control of colour/fill scales can be best achieved through the gradient subfunctions of scale.

  • scale_(colour/fill)_gradient - 2 colour gradient (eg. low to high BMI)

  • scale_(colour/fill)_gradient2 - Diverging colour scale with a midpoint colour (e.g. Down, No Change, Up)

Both functions take a common set of arguments:-

  • low - Colour for low end of gradient scale
  • high - Colour for high end of gradient scale.
  • na.value - Colour for any NA values.

An example using scale_colour_gradient below sets the low and high end colours to White and Red respectively

pcPlot <- ggplot(data=patients_clean,
                 aes(x=Height,y=Weight,colour=BMI))
pcPlot + geom_point(size=4,alpha=0.8) + 
  scale_colour_gradient(low = "White",high="Red")

Similarly we can use the scale_colour_gradient2 function which allows for the specification of a midpoint value and its associated colour.

pcPlot <- ggplot(data=patients_clean,
                 aes(x=Height,y=Weight,colour=BMI))
pcPlot + geom_point(size=4,alpha=0.8) + 
  scale_colour_gradient2(low = "Blue",mid="Black",high="Red",midpoint = median(patients_clean$BMI))

As with previous continous scales, limits and custom labels in scale legend can be added.

pcPlot <- ggplot(data=patients_clean,
                 aes(x=Height,y=Weight,colour=BMI))
pcPlot + geom_point(size=4,alpha=0.8) + scale_colour_gradient2(low = "Blue",
                         mid="Black",
                         high="Red",
                         midpoint = median(patients_clean$BMI),
                         breaks=c(25,30),labels=c("Low","High"),
                         name="Body Mass Index")

Multiple scales may be combined to create high customisable plots and scales

pcPlot <- ggplot(data=patients_clean,
                 aes(x=Height,y=Weight,colour=BMI,shape=Sex))
pcPlot + geom_point(size=4,alpha=0.8)+ scale_shape_discrete(name="Gender") +scale_colour_gradient2(low = "Blue",mid="Black",high="Red",midpoint = median(patients_clean$BMI),
                         breaks=c(25,30),labels=c("Low","High"),
                         name="Body Mass Index")

Statistical transformations.

In ggplot2 many of the statistical transformations are performed without any direct specification e.g. geom_histogram() will use stat_bin() function to generate bin counts to be used in plot.

An example of statistical methods in ggplot2 which are very useful include the stat_smooth() and stat_summary() functions.

The stat_smooth() function can be used to fit a line to the data being displayed.

pcPlot <- ggplot(data=patients_clean,
        mapping=aes(x=Weight,y=Height))
pcPlot+geom_point()+stat_smooth()
## `geom_smooth()` using method = 'loess'

By default a “loess” smooth line is plotted by stat_smooth. Other methods available include lm, glm,gam,rlm.

pcPlot <- ggplot(data=patients_clean,
        mapping=aes(x=Weight,y=Height))
pcPlot+geom_point()+stat_smooth(method="lm")

A useful feature of ggplot2 is that it uses previously defined grouping when performing smoothing.

If colour by Sex is an aesthetic mapping then two smooth lines are drawn, one for each sex.

pcPlot <- ggplot(data=patients_clean,
        mapping=aes(x=Weight,y=Height,colour=Sex))
pcPlot+geom_point()+stat_smooth(method="lm")

This behaviour can be overridden by specifying an aes within the stat_smooth() function and setting inherit.aes to FALSE.

pcPlot <- ggplot(data=patients_clean,
        mapping=aes(x=Weight,y=Height,colour=Sex))
pcPlot+geom_point()+stat_smooth(aes(x=Weight,y=Height),method="lm",inherit.aes = F)

Another useful method is stat_summary() which allows for a custom statistical function to be performed and then visualised.

The fun.y parameter specifies a function to apply to the y variables for every value of x.

In this example we use it to plot the quantiles of the Female and Male Height data

pcPlot <- ggplot(data=patients_clean,
        mapping=aes(x=Sex,y=Height))+geom_jitter()
pcPlot+stat_summary(fun.y=quantile,geom="point",colour="purple",size=8)

Themes

Themes specify the details of data independent elements of the plot. This includes titles, background colour, text fonts etc.

The graphs created so far have all used the default themes ,theme_grey(), but ggplot2 allows for the specification of theme used.

Predefined themes

Predefined themes can be applied to a ggplot2 object using a family of functions theme_style()

In the example below the minimal theme is applied to the scatter plot seen earlier.

pcPlot <- ggplot(data=patients_clean,
        mapping=aes(x=Weight,y=Height))+geom_point()
pcPlot+theme_minimal()

Several predifined themes are available within ggplot2 including:

  • theme_bw

  • theme_classic

  • theme_dark

  • theme_gray

  • theme_light

  • theme_linedraw

  • theme_minimal

Packages such as ggthemes also contain many useful collections of predined theme_style functions.

Creating your themes

As well as making use of predifened theme styles, ggplot2 allows for control over the attributes and elements within a plot through a collection of related functions and attributes.

theme() is the global function used to set attributes for the collections of elements/components making up the current plot.

Within the theme functions there are 4 general graphic elements which may be controlled:-

  • rect
  • line
  • text
  • title

and 5 groups of related elements:-

  • axis
  • legend
  • strip
  • panel (plot panel)
  • plot (Global plot parameters)

These elements may be specified by the use of their appropriate element functions including:

  • element_line()
  • element_text()
  • element_rect()

and additionally element_blank() to set an element to “blank”

A detailed description of controlling elements within a theme can be seen at the ggplot2 vignette and by typing ?theme into the console.

To demonstrate customising a theme, in the example below we alter one element of theme. Here we will change the text colour for the plot.

pcPlot <- ggplot(data=patients_clean,
        mapping=aes(x=Weight,y=Height))+geom_point()
pcPlot+theme(
            text = element_text(colour="red"),
            axis.text = element_text(colour="red")
           )

  • Note because we are changing a text element we use the element_text() function.

A detailed description of which elements are available and their associated element functions can be found by typing ?theme.

If we wished to set the y-axis label to be at an angle we can adjust that as well.

pcPlot <- ggplot(data=patients_clean,
        mapping=aes(x=Weight,y=Height))+geom_point()
pcPlot+theme(
            text = element_text(colour="red"),
            axis.text = element_text(colour="red"),
            axis.title.y = element_text(angle=0)
           )

Finally we may wish to remove axis line, set the background of plot panels to be white and give the strips (title above facet) a cyan background colour.

pcPlot <- ggplot(data=patients_clean,
        mapping=aes(x=Weight,y=Height))+
  geom_point()+
  facet_grid(Sex~Smokes)
pcPlot+theme(
            text = element_text(colour="red"),
            axis.text = element_text(colour="red"),
            axis.title.y = element_text(angle=0),
            axis.line = element_line(linetype = 0),
            panel.background=element_rect(fill="white"),
            strip.background=element_rect(fill="cyan")
           )

+ and %+replace%

When altering themes we have been using the + operator to add themes as we would adding geoms,scales and stats.

When using the + operator

  • Themes elements specified in new scheme replace elements in old theme

  • Theme elements in the old theme which have not been specified in new theme are maintained.

This makes the + operator useful for building up from old themes.

In the example below, we maintain all elements set by theme_bw() but overwrite the theme element attribute of the colour of text.

pcPlot <- ggplot(data=patients_clean,
        mapping=aes(x=Weight,y=Height))+geom_point()
pcPlot+
  theme_bw()+
  theme(text = element_text(colour="red"))

The consequence can be seen comparing the effect of theme() on a plot with a default theme or theme_minimal.

Since the default theme, theme_grey() contains a specification for axis.text colour, i will not replace it with “+” operator.

pcPlot+
  theme(text = element_text(colour="red"))

pcPlot+
  theme_minimal()+
  theme(text = element_text(colour="red"))

In contrast %+replace% replaces all elements within a theme regardless of whether they have been previously specfied in old theme.

When using the %+replace% operator

  • Theme elements specified in new scheme replace elements in old theme

  • Theme elements in the old theme which have not been specified in new theme are also replaced by blank theme elements.

  oldTheme <- theme_bw()
  newTheme_Plus <- theme_bw() +
  theme(text = element_text(colour="red"))
  newTheme_Replace <- theme_bw() %+replace%
  theme(text = element_text(colour="red"))
  oldTheme$text
## List of 11
##  $ family       : chr ""
##  $ face         : chr "plain"
##  $ colour       : chr "black"
##  $ size         : num 11
##  $ hjust        : num 0.5
##  $ vjust        : num 0.5
##  $ angle        : num 0
##  $ lineheight   : num 0.9
##  $ margin       :Classes 'margin', 'unit'  atomic [1:4] 0 0 0 0
##   .. ..- attr(*, "valid.unit")= int 8
##   .. ..- attr(*, "unit")= chr "pt"
##  $ debug        : logi FALSE
##  $ inherit.blank: logi TRUE
##  - attr(*, "class")= chr [1:2] "element_text" "element"
  newTheme_Plus$text
## List of 11
##  $ family       : chr ""
##  $ face         : chr "plain"
##  $ colour       : chr "red"
##  $ size         : num 11
##  $ hjust        : num 0.5
##  $ vjust        : num 0.5
##  $ angle        : num 0
##  $ lineheight   : num 0.9
##  $ margin       :Classes 'margin', 'unit'  atomic [1:4] 0 0 0 0
##   .. ..- attr(*, "valid.unit")= int 8
##   .. ..- attr(*, "unit")= chr "pt"
##  $ debug        : logi FALSE
##  $ inherit.blank: logi FALSE
##  - attr(*, "class")= chr [1:2] "element_text" "element"
  newTheme_Replace$text
## List of 11
##  $ family       : NULL
##  $ face         : NULL
##  $ colour       : chr "red"
##  $ size         : NULL
##  $ hjust        : NULL
##  $ vjust        : NULL
##  $ angle        : NULL
##  $ lineheight   : NULL
##  $ margin       : NULL
##  $ debug        : NULL
##  $ inherit.blank: logi FALSE
##  - attr(*, "class")= chr [1:2] "element_text" "element"

This means that %+replace% is most useful when creating new themes.

theme_get and theme_set

Adding titles for plot and labels.

So far no plot titles have been specified. Plot titles can be specified using the labs functions.

pcPlot <- ggplot(data=patients_clean,
        mapping=aes(x=Weight,y=Height))+geom_point()
pcPlot+labs(title="Weight vs Height",y="Height (cm)")

or specified using the ggtitle and xlab/ylab functions.

pcPlot <- ggplot(data=patients_clean,
        mapping=aes(x=Height,y=Weight))+geom_point()
pcPlot+ggtitle("Weight vs Height")+ylab("Height (cm)")

Saving plots

Plots produced by ggplot can be saved from the interactive viewer as with standard plots.

The ggsave() function allows for additional arguments to be specified including the type, resolution and size of plot.

By default ggsave() will use the size of your current graphics window when saving plots so it may be important to specify width and height arguments desired.

pcPlot <- ggplot(data=patients_clean,
        mapping=aes(x=Weight,y=Height))+geom_point()
ggsave(pcPlot,filename = "anExampleplot.png",width = 15,height = 15,units = "cm")

Exercise set 2

Link_to_exercises

Link_to_exercises_with_images

Link_to_Rmarkdown template

Link_to_answers

References.